A Deep Bag-of-Features Model for Music Auto-Tagging
نویسندگان
چکیده
Feature learning and deep learning have drawn great attention in recent years as a way of transforming input data into more effective representations using learning algorithms. Such interest has grown in the area of music information retrieval (MIR) as well, particularly in music audio classification tasks such as auto-tagging. In this paper, we present a twostage learning model to effectively predict multiple labels from music audio. The first stage learns to project local spectral patterns of an audio track onto a high-dimensional sparse space in an unsupervised manner and summarizes the audio track as a bag-of-features. The second stage successively performs the unsupervised learning on the bag-of-features in a layer-by-layer manner to initialize a deep neural network and finally fine-tunes it with the tag labels. Through the experiment, we rigorously examine training choices and tuning parameters, and show that the model achieves high performance on Magnatagatune, a popularly used dataset in music auto-tagging.
منابع مشابه
Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms
Recently, the end-to-end approach that learns hierarchical representations from raw data using deep convolutional neural networks has been successfully explored in the image, text and speech domains. This approach was applied to musical signals as well but has been not fully explored yet. To this end, we propose sample-level deep convolutional neural networks which learn representations from ve...
متن کاملMirex 2010 Audio Tag Classification via a Bag of Systems Representation
This paper describes an auto-tagging system presented to MIREX 2011 that represents a “Bag of Systems” (BoS) representation of music. Similar to the Bag of Words representation for text documents, the BoS representation uses a dictionary of musical codewords, where each codeword is a generative model that captures timbral and temporal characteristics of music. Songs are represented as a BoS his...
متن کاملSemantic Annotation and Retrieval of Music using a Bag of Systems Representation
We present a content-based auto-tagger that leverages a rich dictionary of musical codewords, where each codeword is a generative model that captures timbral and temporal characteristics of music. This leads to a higher-level, concise “Bag of Systems” (BoS) representation of the characteristics of a musical piece. Once songs are represented as a BoS histogram over codewords, traditional algorit...
متن کاملCodebook-based Scalable Music Tagging with Poisson Matrix Factorization
Automatic music tagging is an important but challenging problem within MIR. In this paper, we treat music tagging as a matrix completion problem. We apply the Poisson matrix factorization model jointly on the vector-quantized audio features and a “bag-of-tags” representation. This approach exploits the shared latent structure between semantic tags and acoustic codewords. Leveraging the recently...
متن کاملImproving Auto-tagging by Modeling Semantic Co-occurrences
Automatic taggers describe music in terms of a multinomial distribution over relevant semantic concepts. This paper presents a framework for improving automatic tagging of music content by modeling contextual relationships between these semantic concepts. The framework extends existing auto-tagging methods by adding a Dirichlet mixture to model the contextual co-occurrences between semantic mul...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1508.04999 شماره
صفحات -
تاریخ انتشار 2015